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Abstract 

For  a  general  univariate  "errors-in-variables"  model,  the  maximum 
likelihood  estimate  of  the  parameter  vector  (assuming  normality  of  the 
errors),  which  has  been  described  in  the  literature,  can  be  expressed  in 
an  alternative  form.  In  this  form,  the  estimate  is  conputationally 
sinpler,  and  deeper  investigation  of  its  properties  is  facilitated.  In 
particular,  we  demonstrate  that,  under  conditions  a  good  deal  less 
restrictive  than  those  which  have  been  previously  assumed,  the  estimate 
is  weakly  consistent. 
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1.  Introduction. 

The  estimation  of  linear  regression  parameters  when  some  variables 
cannot  be  ascertained  due  to  measurement  or  observation  error  is  a 
problem  with  a  long  history  in  the  statistical  literature,  yet  one  with  a 
considerable  recent  emphasis.  We  consider  a  general  "errors- in-variables" 
model  in  which  some  subset  of  the  variables  is  observed  with  error  (much 
of  the  literature  concerns  the  case  in  which  all  variables  are  subject  to 
error,  with  particular  emphasis  on  models  with  just  one  independent 
variable;  see  Moran  (1971)  and  Kendall  and  Stuart  (1961,  Chapter  27)). 

Our  model  is 


Y 

n*l 


X1  h  +  X2  e2  +  e 

nXpl  PlXl  n*P2  P2X^ 


C  =  X  +  U  , 

n*P2 


where  3^  and  $2  are  vectors  of  regression  parameters  to  be  estimated,  Y 
and  C  consist  of  observable  random  variables,  Xj  and  X2  consist  of 
constants  but  X^  is  known  and  X2  is  not,  and  e  and  U  are  composed  of 
random  variables  such  that  the  rows  of  [U  e]  are  i.i.d.  with  mean  zero 


and  unknown  non- singular  covariance  matrix  E 


eu 


(Models  such 


as  this  with  the  independent  variables  being  constants  have  generally 
been  referred  to  under  the  title  "linear  functional  relationship."  A 
related  model  in  which  the  variables  are  stochastic  has  been  called  a 
"linear  structural  relationship";  see  Madansky  (1959)  for  discussion.) 
Although  in  our  discussion  n  will  vary,  there  should  be  no  confusion  if 
we  do  not  subscript  the  matrices  involved. 


We  consider  maximum  likelihood  estimation  under  the  assumption 
that  the  errors  are  jointly  normally  distributed.  It  is  well-known  that 
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the  supremuni  of  the  likelihood  is  infinite  unless  we  inpose  additional 
structure  on  X  (and  furthermore,  that  under  any  conditions  on  E  which 
yield  a  solution  to  the  likelihood  equations,  the  estimate  obtained  is 
the  same  as  that  obtained  by  the  method  of  weighted  least  squares) .  The 
assumption  most  frequently  made  in  the  literature,  and  one  which  we  will 
adopt,  is 


(1.1)  E  =  o2Eq  =  o2 


with  E  known  . 
o 


The  most  detailed  results  along  these  lines  can  be  obtained  from  the  work 
of  Gleser  and  his  students,  who  considered  multivariate  regression 
models.  In  our  model,  let 


(1.2) 


W  =  [C  Y]  •  R[C  Y]  with  R  =  I  -  XjfXjXj)'1  XJ  , 

0  =  Ap  w)  (Aj(A)  denotes  ith  largest  eigenvalue  of  A)  , 


g'  =  (gj  g2  )  is  an  eigenvector  associated  with  0  . 
lxl 

Mealy  (1975)  has  shown  that  if  g2  *  0,  then  the  MLE's  of  and  62 
exist  and  are  given  by: 


(1.3) 


-W 

(XJXj)'1  X*(Y-C$2) 


In  Section  2,  we  demonstrate  that  the  MLE  can  be  expressed  in 
alternate  forms  which  are  easier  to  interpret  than  (1.3),  as  well  as 
computationally  simpler.  These  "simpler  forms"  also  facilitate  deeper 
investigation  of  certain  properties  of  the  estimate.  In  Section  3,  we 

A  A 

consider  one  such  aspect:  we  demonstrate  that  8^  and  82  are  weakly 
consistent  estimates  under  conditions  weaker  than  those  which  have  been 
previously  shown. 


2.  The  ML1:  under  Normality. 

In  this  section,  we  will  make  use  of  the  following  obvious  notation 


X  =  [\ 

x2] 

X 

z*  ~ 

c  u 

0 

o" 

z*  = 

7 

— 

C*  =  [\ 

C] 

p*p 

Z*' 

Leu 

D 

0 

z_ 

8'  =  (8|  1 

U*  =  [0  IJ] 


p  *  pj  +  p2  ; 


we  define  ^q’^uo’^guo  anal°g°usl>r‘  Also,  let  H  =  [C*  Y] '  [C*  Y] . 
The  main  result  of  this  section  is: 


Theorem  1.  In  our  model ,  if  the  joint  distribution  of  the  errors  is 
absolutely  continuous  with  respect  to  Lebesgue  measure,  then  the 
normality -MLE  of  8  exists  almost  surely  and  is  given  by 

K  *  (CRC-e^uo’"1  <C'RV-8W 

(2.1) 

Bj  =  (X^Xj)'1  Xj(Y-C62)  , 


with  0  and  R  given  by  (1.2);  we  also  have 
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(2.2)  3  =  (C**  C*  -  S^q)’1  (C*'Y-0E*uO)  , 


with 


Y  =  largest  root  of 


|i*  -  YH|  =  0. 


In  the  form  (2.2),  8  can  be  viewed  as  a  modification  of  the  ordinary 
least  squares  regression  estimate,  which  is  known  to  be  inconsistent  in 
the  errors -in-variables  (E.I.V.)  case.  In  fact,  the  estimate  seems  to 
operate  much  like  the  "method-of-moments"  estimate  described  by  Fuller 
(1980).  In  an  E.I.V.  model  in  which  Eu  and  can  be  consistently  and 
independently  estimated  but  are  otherwise  unknown,  Fuller  has  proposed 
estimates  such  as 


0  =  (C*'  C*  -  nE*)  1  (C*'Y-nZ*u)  . 


Under  the  assumption  that  n  1  X'  X  converges  to  a  finite  matrix, 

-1  2 

Healy  (1975)  showed  that  n  0  consistently  estimates  o  ;  hence 
-1  P 

n  01  ->  l  in  our  model.  Thus,  while  Fuller's  method  requires  an 

uo  u  ’ 

"external"  variance  estimate,  the  maximum  likelihood  approach  in  effect 
produces  its  own  "internal"  estimate.  Of  course  we  do  not  get  this  for 
free;  the  price  we  have  paid  is  the  additional  structure  that  we  have 
imposed  upon  E. 

In  proving  Theorem  1,  we  will  make  use  of  the  following  result: 


Lemma  1 .  Under  the  conditions  of  Theorem  2, 
eigenvalue  of  C'  RC  with  probability  one. 


=  * 

P2  +  1  ° 


W) 


is  not  an 
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Proof.  Let  G  = 


'll 

G12 

'21 

G22 

lxl 

he  the  matrix  of  normalized  eigenvectors 


associated  with  the  (ordered)  eigenvalues  of  W,  with  F  =  G 
partitioned  similarly.  Thus 

(2.3)  Eq1W=GDF 


_  A  0 

with  D  = 

[0  0 

implies  that 


,  X  =  diag(X1(FQ1  W)  ,. . . ,  X^  (Z^W)).  Equation  (2.3) 


p2  o 


(2.4)  C'RC  =  (Z  G..+Z  C,.)(X-0I  ^.,  +  01 

v  v  uo  11  euo  21' v  p2  11  p2 


From  Gleser  (1981),  we  infer  that  Z  G..  +  E  G-.  and  F...  are 

uo  11  euo  21  11 

non-singular  a.s.  if  the  error  distribution  is  absolutely  continuous;  it 

follows  from  a  result  of  Okamoto  (1973)  that  the  eigenvalues  of  K  are 

distinct  with  probability  one  (all  we  need  is  0  *  X^  (Zq*W)),  in  which 

case  X  -  01  is  non-singular.  The  result  follows  since  (2.4)  implies 
p2 

that  C'  RC  -  01  is  non-singular  a.s.  □ 

p2 


Proof  of  Theorem  1.  From  the  definition  of  0  and  G, 


(2.5) 


C'  RC  -  0Z 

uo 

Y'  RC  -  0E ' 

euo 


C’  RY  -  0E  “I 

[g.,1 

euo 

12 

Y1  RY  -  0 

G22 

Gleser  (1981)  has  shown  that  G22  *  0  a.s.,  in  which  case  the  MLE  exists. 
As  mentioned  above,  0  has  multiplicity  one  a.s.,  so  the  left-hand  matrix 
has  rank  p2  w.p.  1,  and  solutions  to  (2.5)  will  be  determined  by 


equations  corresponding  to  any  p2  linearly  independent  rows  of  that 
matrix.  In  light  of  Lemma  1,  the  first  p2  rows  will  do: 
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(.2.6) 


(C'«C'Vl2  *  <C'RY'flEcuo)G22  ■  0 


->  -g12g-12  -  (CRC-er^)-1  (C'RY-ercuo)  . 


By  (1.3),  this  is  0.,,  which  demonstrates  (2.1). 

For  the  second  part  of  the  theorem,  note  first  that 


if1!*  = 
o 


W'1  Z 


from  which  it  follows  that  0  of  eq.  (2.2)  is  the  same  as  that  of  (1.2) 

A 

(in  this  part  of  the  theorem,  we  want  to  express  3  in  a  form  which  does 
not  explicitly  refer  to  our  partitions  of  the  matrices  involved) . 

Now  according  to  (2.2), 


Fa  " 
*1 

XiXl 

XJ  c 

-1 

1 

>* 

—  f- < 

X 

1 

A 

Bo 

C’  X. 

C'  C  -  0Z 

C’  Y  -  9£ 

Li! 

1 

UOj 

euo 

(x-Xj)'1  +  (xjXj)"1  x|  cqc*  Xj  (xj  xp1 


-QC’  Xj(X]  X2) 


-1 


(with  Q  =  (C*  RC-  6E  )-1) 


VI 


(X-Xx)  x  X'Y  -  (X^)'1  X|  C[Q(C’ Y  -  6Eeuo)  -  QC'  Xj (X*  Xj) "4  X' V] 

Q(C' Y  -exeu0)  -  QC’  ^(XiXj)'1  X|  Y 


■(xjxp'1  X’ oq] 

Q 


xiY 


If’ 


y  -  ez 


euo 


'1  vi  VI 


which  implies 
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0,  =  Q(C'  RY  -  07.  )  =  (C'  RC  -  6E  )'*  (C’  RY  -  OE  ) 

2  vV  cuo'  1  uo'  v  euo' 

3j  «  (X|  X^' 1  Xj(Y-C02)  . 

These  agree  with  (1.3)  and  (2.6).  Q 

3.  Consistency. 

Various  results  concerning  weak  and  strong  consistency  of  B  in  our 
model  and  related  models  have  been  described  by  Healy  (1975) ,  Bhargava 
(1975),  and  Cleser  (1981).  Generally,  all  require  that 

(3.1)  lim  n  1  X'  X  exists  and  is  positive  definite  . 

n-x» 

Such  a  condition  on  X  is  much  stronger  than  conditions  which  have  been 
shown  to  be  sufficient  for  consistency  of  the  usual  linear  regression 
estimate  (the  special  case  of  our  model  with  p2  =  0).  In  recent  years, 
results  of  increasing  strength  and  generality  on  this  matter  have  been 
produced:  see  Bicker  (1963),  Drygas  (1976),  Anderson  and  Taylor  (1976), 
Lai  et  a^.  (1979).  Conditions  on  the  errors  vary  somewhat  among  these 
papers,  but  the  condition  on  X  which  is  crucial  to  all  of  them  is 

(3.2)  Xp(X'  X)  -*■  «>  as  n  -*■  °°  . 

We  would  like  to  find  conditions  ''intermediate”  between  (3.1)  and  (3.2) 
which  are  sufficient  for  weak  consistency  of  B. 

Theorem  2.  If  the  following  conditions  on  X  are  satisfied: 

n  2  \  (X'  X)  ■+  °°  as  n  -*•  ® 

P 


(A.  1) 
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(A. 2)  X'^X'  X)X2(X'  X)  ->  -o  as  n  -  .  j> 

and  the  joint  distribution  of  the  errors  possesses  finite  fourth  moment , 
a  p 

then  3  -*■  3  as  n  -*•  00 .  (Note  that  we  are  obtaining  consistency'  without 
using  the  assumption  that  the  errors  are  normally  distributed.) 

The  following  simple  lemma  will  be  useful: 

hemrg  2.  (i)  X^RXp  <  X^X’ X)  ; 

(ii)  letting  (X'X)"1  =  [  \  l2  }  ,  XjC^L')  <  X^(X’X)'1  . 

PxPa  P*P2 

Proof.  Since  (Xi,  RX^  1  is  the  lower  right-hand  submatrix  of  (X' X)  1 , 

XtX’X)'1*  inf  Z’(X’X)'1  Z  s  inf  Z'  (XI  RX-) 'l  1  =  X  (X^RX,)" 
P  j  I Z 1  1  =  1  1 1 z  1 1  =  1  22  P2  -  2 

>  A1(X'  X)  >  X1(X^RX2)  . 

Noting  that  the  non-zero  eigenvalues  of  L7L2  and  L7L2  are  identical, 
(ii)  follows  similarly  since  is  a  lower  right-hand  submatrix  of 

(X'  X)'2  . 

Proof  of  Theorem  2. 

3  =  (C*1  c*  -  ez*o) _1  (C*’  y  -  ez*uo) 

=  (Ip  +  (X’  X)'1  (X'U*  +  U*'  X  +  (U*'U*  -  nZ*)  +  (no2  -  P)^))"1 

x  (X’  X)'1  (X'  Y  +  U*'  XB  +  (U* '  e  -  nX*  )  +  (no2  -  G)X*  )  . 
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Clearly,  it  will  suffice  to  show  that: 

(i)  (X’  X)"1  X'U*  £  0 

(ii)  fX’  X)'1  U*‘  X  2  0 

(iii)  (X'  X)'1  (U*'U*  -  n£*)  £  0 

(iv)  |no2  -  0|(X’  X)'1  S  0 

(v)  (X’  X)'1  X'  Y  £  3 

(vi)  (X*  X)'1  (U*»  c  -  n£*)  5  0  . 


Bicker  (1963)  has  shown  (v)  when  X  (X'  X)  -*•  00  ,  which  is  of  course  true  by 
(A.l);  (i)  also  follows  immediately  from  his  work  under  the  same 
condition. 


Note  that  U*'U*  -  nX*  =  °p(u  )  if  the  errors  have  finite  fourth 

-I  -4 

moments,  so  (iii)  holds  if  (X'  X)  =  o(n  ),  which  follows  from  (A.l). 

The  same  argument  demonstrates  (vi).  (X'  X)  *  U*'  X  =  U'  X;  the 

(i.j)**1  element  has  mean  zero  and  variance  7  X,2.  •  P!  T.  P. ,  where  P.  is 

’ J  r  kj  1  u  1  1 

the  iu  column  of  .  Thus  (ii)  is  satisfied  if 

max  diag(X'  X)  •  max  diag^Lp  -*•  0;  this  is  seen  to  be  equivalent  to 

(A.  2)  using  Lemma  2(H). 

Letting  k  henceforth  denote  X^(X'  X)  \  we  need  only  demonstrate 

2  P  - 1 

(iv) ,  which  is  equivalent  to  k(0  -  no  )  -*■  0.  Note  that  0  =  X  +  ^(XQ  ^ 

2  -1  7 

if  and  only  if  k(0-no  )  =  X  +  ,(k(E  W  -  no"  I  +.));  we  will  show  that 

P2  ^  ®  P2  ^ 

this  converges  to  zero  in  probability.  Let 


D  =  kx" 


X2  **2 


$2^2  RX2 


X2 


02^2  **2  6 


.  As  the  product  of  a  positive  definite 


matrix  and  a  positive  semi-definite  matrix  of  rank  p^,  D  has  P2  positive 
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eigenvalues,  its  other  eigenvalue  being  zero.  Now 

kfl'1 W  -  no2  I  .)  -  D 
o  p2*l' 

rx^RU  +  U'RX2  U'  RX2  6  +  Re  " 

■  Ko 

32X2  RU  +  e’  RX2  62X2  Re  +  e'  RX^ 

+  kZ^{[U  e] '  [U  e]  -  nE)  +  kl'^U  e]  *  (R  -  I  )  [U  e] 
=  Nlj  +  M2  +  M3  ,  say  . 


Using  arguments  essentially  the  same  as  before,  -*•  0  by  (A. 2)  and 

Lemma  2 (i) .  M 2  does  likewise  since  k  =  o(n  2).  Finally,  noting  that 

2 

In*R  is  idempotent,  we  deduce  that  -  -°  kp1I  +1.  The  diagonal 

elements  of  the  positive  definite  matrix  EqM3  are  positive  with 
expectations  going  to  zero;  thus  they  are  o  (1)  themselves. 

Consequently,  -*•  0. 

Since  eigenvalues  are  continuous  functions  of  a  sequence  of 
matrices,  it  follows  from  the  above  discussion  that 

X  (k(Z  1  W  -  na2  I  ,))  £  X  ,  (D)  =  0,  and  hence  k(6-no2)  ^  0.  C 
P2  i  u  P2  1  P2  1 


Our  assumptions  (A.l)  and  (A. 2)  are  intermediate  in  the  sense 
mentioned  earlier:  either  one  implies  (3.2),  while  both  are  implied  by 
(3.1).  Condition  (A.l)  requires  that  X'  X  "gets  large"  at  a  faster  rate 
than  does  (3.1)  (it  can  be  seen  by  considering  the  demonstration  of 
(iii),  e.g.  in  the  proof  of  Theorem  2,  that  (3.2)  is  too  weak  a  condition 
for  our  model).  A  simple  example  in  which  (3.1)  is  too  weak  to  ensure 
consistency,  but  where  (A.l)  suffices,  is  a  situation  where  p  =  p2  =  1, 
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and  the  independent  variable  varies  linearly  with  n.  Condition  (A. 2) 
will  also  hold  much  more  generally  than  (3.1);  it  is  satisfied,  for 
example,  if  (A.l)  holds  and  the  independent  variables  are  bounded. 
Finally,  while  our  requirement  of  fourth  moments  of  the  errors  is  not 
particularly  restrictive,  we  could  weaken  it  if  we  were  willing  to 
strengthen  (A.l)  (for  example,  we  would  require  only  finite  (2+6)^ 
moment,  0  s  {  <  2,  if  n  +  ^  1  X  (X'  X)  -*•  00  as  n  ■+•  ») . 
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